The following report focus on analyzing blood samples taken from patients in Wuhan who were diagnosed with COVID19. Data have been taken between 10.01.2020 and 18.02.2020.
The main purpose of this report is to find out which blood component can be used to predict if the patient is going to die or to recover.
Performed analyzes show that men are more likely to not survive COVID-19 infection. More then every second men dies, at least in the analyzed dataset. The situation is not so dire for females, research shows that “only” every third doesn’t survive the infection.
Another important factor is age, in the report it is shown that the higher is the patient age the higher is the probability of not surviving this illness.
The report also focus on the impact of hospitalization time over the fatality, it shows that the longer patient stays in the hospital the higher is the chance of surviving. However, after 20 days in the hospital probability rises again.
The analyze was also put on the biomarkers suggested in the article Tan et al article, that means that we checked the correlation between LDH, CRP, lymphocytes count and fatality. This correlation shows that the lesser is the value of LDH and CRP then the higher is probability of dying. However, High number of Lymphocytes may suggest a change of recovering. At the end, there is a classification model that takes all of the parameters in to the consideration and it shows that the Tan et al article is right and LDH, CRP, lymphocytes count have an impact on fatality.
readxldplyrtidyrstringrggplot2plotlycorrplotcaretAs it was mentioned in the Introduction, the report analyzes data of patients from Wuhan hospital. The data consists of record of 375 patients (224M vs 151F).
Each patient is described by multiple rows. Each row contains some generic properties, such as Age, Gender, Admission Time and so on. Additionally, each row represents another blood test, thus result of it populates appropriate columns. It may happened that a patient didn’t have a test for some properties and these values are missing.
To condensate each patient to one row, researched decided to calculate mean (with ignore to missing values) between corresponding results.
To overcome issue with missing values, researcher decided to replace missing values with a median of the corresponding values. The only exception was made for column regarding NCOV_NUCLEIC_ACID_DETECTION where missing values were replaced with 0.
The following table contains detailed results of the dataset.
| DAYS_COUNT | AGE | GENDER | DISCHARGE_TIME | HAS_SURVIVED | HYPERSENSITIVE_CARDIAC_TROPONINI | HEMOGLOBIN | SERUM_CHLORIDE | PROTHROMBIN_TIME | PROCALCITONIN | EOSINOPHILS… | INTERLEUKIN_2_RECEPTOR | ALKALINE_PHOSPHATASE | ALBUMIN | BASOPHIL… | INTERLEUKIN_10 | TOTAL_BILIRUBIN | PLATELET_COUNT | MONOCYTES… | ANTITHROMBIN | INTERLEUKIN_8 | INDIRECT_BILIRUBIN | RED_BLOOD_CELL_DISTRIBUTION_WIDTH | NEUTROPHILS… | TOTAL_PROTEIN | QUANTIFICATION_OF_TREPONEMA_PALLIDUM_ANTIBODIES | PROTHROMBIN_ACTIVITY | HBSAG | MEAN_CORPUSCULAR_VOLUME | HEMATOCRIT | WHITE_BLOOD_CELL_COUNT | TUMOR_NECROSIS_FACTOR.U.0391. | MEAN_CORPUSCULAR_HEMOGLOBIN_CONCENTRATION | FIBRINOGEN | INTERLEUKIN_1.U.0392. | UREA | LYMPHOCYTE_COUNT | PH_VALUE | RED_BLOOD_CELL_COUNT | EOSINOPHIL_COUNT | CORRECTED_CALCIUM | SERUM_POTASSIUM | GLUCOSE | NEUTROPHILS_COUNT | DIRECT_BILIRUBIN | MEAN_PLATELET_VOLUME | FERRITIN | RBC_DISTRIBUTION_WIDTH_SD | THROMBIN_TIME | X…LYMPHOCYTE | HCV_ANTIBODY_QUANTIFICATION | D.D_DIMER | TOTAL_CHOLESTEROL | ASPARTATE_AMINOTRANSFERASE | URIC_ACID | HCO3. | CALCIUM | AMINO.TERMINAL_BRAIN_NATRIURETIC_PEPTIDE_PRECURSOR.NT.PROBNP. | LACTATE_DEHYDROGENASE | PLATELET_LARGE_CELL_RATIO | INTERLEUKIN_6 | FIBRIN_DEGRADATION_PRODUCTS | MONOCYTES_COUNT | PLT_DISTRIBUTION_WIDTH | GLOBULIN | X.U.0393..GLUTAMYL_TRANSPEPTIDASE | INTERNATIONAL_STANDARD_RATIO | BASOPHIL_COUNT… | X2019.NCOV_NUCLEIC_ACID_DETECTION | MEAN_CORPUSCULAR_HEMOGLOBIN | ACTIVATION_OF_PARTIAL_THROMBOPLASTIN_TIME | HIGH_SENSITIVITY_C.REACTIVE_PROTEIN | HIV_ANTIBODY_QUANTIFICATION | SERUM_SODIUM | THROMBOCYTOCRIT | ESR | GLUTAMIC.PYRUVIC_TRANSAMINASE | EGFR | CREATININE | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 0.00 | Min. :18.00 | Length:375 | Min. :2020-01-23 09:09:23 | Mode :logical | Min. : 1.9 | Min. : 61.8 | Min. : 74.60 | Min. :11.50 | Min. : 0.0200 | Min. :0.0000 | Min. : 65.5 | Min. : 17.00 | Min. :18.55 | Min. :0.0000 | Min. : 5.00 | Min. : 2.75 | Min. : -1.0 | Min. : 0.500 | Min. : 42.00 | Min. : 5.00 | Min. : 1.100 | Min. :10.70 | Min. : 1.80 | Min. :47.20 | Min. : 0.0200 | Min. : 25.00 | Min. : 0.000 | Min. : 61.95 | Min. :17.56 | Min. : 0.716 | Min. : 4.000 | Min. :299.0 | Min. :0.550 | Min. : 5.000 | Min. : 1.700 | Min. : 0.0250 | Min. :5.000 | Min. : 1.850 | Min. :0.000000 | Min. :2.070 | Min. :3.130 | Min. : 1.000 | Min. : 0.320 | Min. : 1.600 | Min. : 8.50 | Min. : 17.8 | Min. : 31.30 | Min. :13.60 | Min. : 0.150 | Min. :0.02000 | Min. : 0.210 | Min. :1.004 | Min. : 7.667 | Min. : 84.2 | Min. :10.00 | Min. :1.780 | Min. : 5 | Min. : 116.0 | Min. :11.20 | Min. : 1.50 | Min. : 4.00 | Min. : 0.0300 | Min. : 8.20 | Min. :18.50 | Min. : 7.00 | Min. :0.840 | Min. :0.00000 | Min. :-1.0000 | Min. :20.80 | Min. : 21.80 | Min. : 0.10 | Min. :0.05000 | Min. :119.1 | Min. :0.0100 | Min. : 1.00 | Min. : 5.00 | Min. : 2.15 | Min. : 12.50 | |
| 1st Qu.: 5.00 | 1st Qu.:46.00 | Class :character | 1st Qu.:2020-02-11 13:39:21 | FALSE:174 | 1st Qu.: 3.7 | 1st Qu.:114.2 | 1st Qu.: 99.12 | 1st Qu.:13.57 | 1st Qu.: 0.0450 | 1st Qu.:0.0000 | 1st Qu.: 615.0 | 1st Qu.: 56.00 | 1st Qu.:29.35 | 1st Qu.:0.1000 | 1st Qu.: 5.10 | 1st Qu.: 7.50 | 1st Qu.:132.8 | 1st Qu.: 3.767 | 1st Qu.: 86.42 | 1st Qu.: 14.25 | 1st Qu.: 4.129 | 1st Qu.:12.03 | 1st Qu.:64.70 | 1st Qu.:62.82 | 1st Qu.: 0.0400 | 1st Qu.: 70.63 | 1st Qu.: 0.000 | 1st Qu.: 87.00 | 1st Qu.:34.04 | 1st Qu.: 5.484 | 1st Qu.: 7.950 | 1st Qu.:336.0 | 1st Qu.:3.690 | 1st Qu.: 5.000 | 1st Qu.: 4.000 | 1st Qu.: 0.5675 | 1st Qu.:6.250 | 1st Qu.: 3.927 | 1st Qu.:0.001548 | 1st Qu.:2.270 | 1st Qu.:4.042 | 1st Qu.: 5.771 | 1st Qu.: 3.337 | 1st Qu.: 3.300 | 1st Qu.:10.22 | 1st Qu.: 620.9 | 1st Qu.: 38.70 | 1st Qu.:16.00 | 1st Qu.: 5.342 | 1st Qu.:0.05000 | 1st Qu.: 0.570 | 1st Qu.:3.023 | 1st Qu.: 21.000 | 1st Qu.:202.9 | 1st Qu.:21.38 | 1st Qu.:2.020 | 1st Qu.: 138 | 1st Qu.: 226.2 | 1st Qu.:26.35 | 1st Qu.: 16.04 | 1st Qu.: 4.90 | 1st Qu.: 0.3200 | 1st Qu.:11.20 | 1st Qu.:30.42 | 1st Qu.: 22.00 | 1st Qu.:1.030 | 1st Qu.:0.01000 | 1st Qu.:-1.0000 | 1st Qu.:29.86 | 1st Qu.: 37.00 | 1st Qu.: 11.75 | 1st Qu.:0.08000 | 1st Qu.:138.1 | 1st Qu.:0.1600 | 1st Qu.: 18.00 | 1st Qu.: 17.67 | 1st Qu.: 68.92 | 1st Qu.: 59.00 | |
| Median :10.00 | Median :62.00 | Mode :character | Median :2020-02-16 17:40:07 | TRUE :201 | Median : 13.0 | Median :125.8 | Median :101.80 | Median :14.31 | Median : 0.1000 | Median :0.2500 | Median : 693.5 | Median : 69.29 | Median :33.15 | Median :0.2000 | Median : 6.20 | Median : 10.59 | Median :197.0 | Median : 6.483 | Median : 88.00 | Median : 16.60 | Median : 5.500 | Median :12.54 | Median :76.40 | Median :66.72 | Median : 0.0500 | Median : 86.00 | Median : 0.010 | Median : 89.94 | Median :36.77 | Median : 7.780 | Median : 8.425 | Median :343.3 | Median :4.365 | Median : 5.000 | Median : 5.434 | Median : 0.9137 | Median :6.500 | Median : 4.437 | Median :0.020000 | Median :2.360 | Median :4.337 | Median : 6.990 | Median : 5.473 | Median : 4.625 | Median :10.80 | Median : 753.0 | Median : 40.62 | Median :16.64 | Median :14.700 | Median :0.06000 | Median : 1.345 | Median :3.636 | Median : 29.000 | Median :245.7 | Median :23.52 | Median :2.110 | Median : 318 | Median : 306.4 | Median :30.91 | Median : 21.09 | Median : 5.70 | Median : 0.4225 | Median :12.60 | Median :32.98 | Median : 33.20 | Median :1.100 | Median :0.01500 | Median :-1.0000 | Median :30.87 | Median : 39.50 | Median : 49.00 | Median :0.09000 | Median :139.9 | Median :0.2150 | Median : 30.00 | Median : 25.00 | Median : 89.38 | Median : 75.49 | |
| Mean :10.86 | Mean :58.83 | NA | Mean :2020-02-15 16:42:59 | NA | Mean : 589.3 | Mean :125.2 | Mean :102.38 | Mean :15.53 | Mean : 0.7515 | Mean :0.6588 | Mean : 832.4 | Mean : 81.61 | Mean :33.12 | Mean :0.2218 | Mean : 10.63 | Mean : 15.09 | Mean :195.3 | Mean : 6.612 | Mean : 87.90 | Mean : 45.66 | Mean : 6.668 | Mean :12.97 | Mean :75.78 | Mean :66.35 | Mean : 0.1108 | Mean : 82.78 | Mean : 6.182 | Mean : 89.91 | Mean :36.86 | Mean : 12.882 | Mean : 10.263 | Mean :343.6 | Mean :4.427 | Mean : 5.903 | Mean : 8.458 | Mean : 1.0969 | Mean :6.468 | Mean : 7.747 | Mean :0.038627 | Mean :2.346 | Mean :4.401 | Mean : 8.341 | Mean : 7.280 | Mean : 8.429 | Mean :10.88 | Mean : 1170.5 | Mean : 41.87 | Mean :17.43 | Mean :16.592 | Mean :0.09812 | Mean : 5.771 | Mean :3.667 | Mean : 47.028 | Mean :279.3 | Mean :22.89 | Mean :2.103 | Mean : 2006 | Mean : 454.2 | Mean :31.57 | Mean : 71.26 | Mean : 27.86 | Mean : 0.5537 | Mean :12.97 | Mean :33.19 | Mean : 49.89 | Mean :1.227 | Mean :0.01729 | Mean :-0.1627 | Mean :30.91 | Mean : 40.62 | Mean : 69.16 | Mean :0.09712 | Mean :140.7 | Mean :0.2144 | Mean : 32.97 | Mean : 38.02 | Mean : 84.31 | Mean : 104.74 | |
| 3rd Qu.:16.00 | 3rd Qu.:70.00 | NA | 3rd Qu.:2020-02-19 11:47:14 | NA | 3rd Qu.: 31.9 | 3rd Qu.:137.3 | 3rd Qu.:104.29 | 3rd Qu.:15.90 | 3rd Qu.: 0.3050 | 3rd Qu.:0.9000 | 3rd Qu.: 780.5 | 3rd Qu.: 90.87 | 3rd Qu.:36.90 | 3rd Qu.:0.3000 | 3rd Qu.: 7.20 | 3rd Qu.: 14.45 | 3rd Qu.:245.5 | 3rd Qu.: 8.858 | 3rd Qu.: 89.20 | 3rd Qu.: 20.60 | 3rd Qu.: 7.460 | 3rd Qu.:13.50 | 3rd Qu.:90.22 | 3rd Qu.:70.65 | 3rd Qu.: 0.0600 | 3rd Qu.: 95.25 | 3rd Qu.: 0.010 | 3rd Qu.: 92.75 | 3rd Qu.:39.70 | 3rd Qu.: 12.895 | 3rd Qu.: 9.100 | 3rd Qu.:350.5 | 3rd Qu.:5.145 | 3rd Qu.: 5.000 | 3rd Qu.: 9.700 | 3rd Qu.: 1.3458 | 3rd Qu.:6.563 | 3rd Qu.: 5.330 | 3rd Qu.:0.060000 | 3rd Qu.:2.430 | 3rd Qu.:4.626 | 3rd Qu.: 9.342 | 3rd Qu.:10.408 | 3rd Qu.: 6.900 | 3rd Qu.:11.40 | 3rd Qu.: 865.4 | 3rd Qu.: 43.55 | 3rd Qu.:17.50 | 3rd Qu.:25.700 | 3rd Qu.:0.08000 | 3rd Qu.:10.694 | 3rd Qu.:4.187 | 3rd Qu.: 41.000 | 3rd Qu.:320.4 | 3rd Qu.:25.30 | 3rd Qu.:2.193 | 3rd Qu.: 815 | 3rd Qu.: 590.7 | 3rd Qu.:35.50 | 3rd Qu.: 27.50 | 3rd Qu.: 7.30 | 3rd Qu.: 0.5358 | 3rd Qu.:14.03 | 3rd Qu.:35.97 | 3rd Qu.: 51.90 | 3rd Qu.:1.260 | 3rd Qu.:0.02000 | 3rd Qu.: 1.0000 | 3rd Qu.:32.00 | 3rd Qu.: 42.62 | 3rd Qu.:112.80 | 3rd Qu.:0.10000 | 3rd Qu.:142.3 | 3rd Qu.:0.2600 | 3rd Qu.: 40.00 | 3rd Qu.: 38.38 | 3rd Qu.:104.20 | 3rd Qu.: 95.00 | |
| Max. :35.00 | Max. :95.00 | NA | Max. :2020-03-04 16:21:51 | NA | Max. :50000.0 | Max. :178.0 | Max. :140.20 | Max. :83.35 | Max. :29.2400 | Max. :5.5500 | Max. :7500.0 | Max. :620.00 | Max. :46.30 | Max. :1.7000 | Max. :750.00 | Max. :390.85 | Max. :554.0 | Max. :35.200 | Max. :130.00 | Max. :3409.00 | Max. :102.400 | Max. :25.60 | Max. :98.80 | Max. :81.50 | Max. :11.9500 | Max. :142.00 | Max. :250.000 | Max. :116.15 | Max. :52.30 | Max. :370.930 | Max. :103.550 | Max. :488.0 | Max. :8.950 | Max. :88.500 | Max. :58.100 | Max. :43.0550 | Max. :7.565 | Max. :252.253 | Max. :0.380000 | Max. :2.790 | Max. :6.860 | Max. :28.975 | Max. :26.795 | Max. :288.450 | Max. :14.20 | Max. :50000.0 | Max. :100.10 | Max. :92.75 | Max. :52.350 | Max. :2.09000 | Max. :40.500 | Max. :6.160 | Max. :959.500 | Max. :993.0 | Max. :30.07 | Max. :2.580 | Max. :70000 | Max. :1867.0 | Max. :58.60 | Max. :5000.00 | Max. :190.80 | Max. :36.9400 | Max. :24.20 | Max. :48.20 | Max. :732.00 | Max. :5.450 | Max. :0.10000 | Max. : 1.0000 | Max. :50.80 | Max. :102.75 | Max. :320.00 | Max. :0.27000 | Max. :179.6 | Max. :0.5100 | Max. :106.00 | Max. :1061.00 | Max. :215.45 | Max. :1430.00 |
The following graph represents an overall comparison between people who has survived and those who weren’t so lucky.
The following graph represents an overall comparison between men and women.
The chart below shows the gender mortality.
The following chart presents age histogram, divided by gender.
The following chart shows mortality by age.
The following chart shows mortality by age and gender.
The following chart presents amount of days spent in the hospital, with respect to a gender.
The following chart represents how probability of death changes with a respect to the amount of days spent in the hospital.
The following graph shows how the number of survivors/deaths changed over time.
The following chapter focus on the classification model. Random Forest was used.
## Warning: The `i` argument of ``[`()` can't be a matrix as of tibble 3.0.0.
## Convert to a vector.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Confusion Matrix and Statistics
##
## Reference
## Prediction ALIVE DEAD
## ALIVE 50 1
## DEAD 0 42
##
## Accuracy : 0.9892
## 95% CI : (0.9415, 0.9997)
## No Information Rate : 0.5376
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.9783
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 1.0000
## Specificity : 0.9767
## Pos Pred Value : 0.9804
## Neg Pred Value : 1.0000
## Prevalence : 0.5376
## Detection Rate : 0.5376
## Detection Prevalence : 0.5484
## Balanced Accuracy : 0.9884
##
## 'Positive' Class : ALIVE
##
## rf variable importance
##
## only 20 most important variables shown (out of 78)
##
## Overall
## NEUTROPHILS... 28.893
## LACTATE_DEHYDROGENASE 18.645
## X...LYMPHOCYTE 16.168
## INTERNATIONAL_STANDARD_RATIO 10.332
## UREA 6.543
## PROTHROMBIN_TIME 6.249
## ALBUMIN 5.115
## NEUTROPHILS_COUNT 4.904
## AGE 4.512
## LYMPHOCYTE_COUNT 4.132
## DAYS_COUNT 2.842
## HIGH_SENSITIVITY_C.REACTIVE_PROTEIN 2.678
## DISCHARGE_TIME 2.404
## GLUCOSE 2.332
## PROCALCITONIN 1.964
## PLATELET_COUNT 1.655
## EOSINOPHILS... 1.439
## D.D_DIMER 1.392
## DIRECT_BILIRUBIN 1.149
## HYPERSENSITIVE_CARDIAC_TROPONINI 1.112